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ABSTRACT 


Six  artificial  neural  network  models  are  explored  as  poten¬ 
tial  methodologies  for  the  automated  interpretation  of  satellite 
cloud  images.  The  Multi-layer  Perceptron  Network  is  chosen  to  be 
the  most  applicable  to  the  image  interpretation  problem.  A 
complete,  mathematical  description  of  the  methodology  is  present¬ 
ed.  The  neural  network's  classification  capability  is  demon¬ 
strated  using  simple,  geometric  patterns  and  alphabetic  charac¬ 
ters.  A  more  complex  test  using  GOES  infrared  imagery  shows  that 
the  neural  network  can  distinguish  53  of  54  large-scale  cloud 
patterns.  An  architecture  for  a  complete,  automated  cloud  fea¬ 
ture  recognition  system  is  proposed. 
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NEURAL  NETWORK  METHODOLOGIES  AND  THEIR  POTENTIAL 
APPLICATION  TO  CLOUD  PATTERN  RECOGNITION 

1.  Introduction 

Artificial  neural  network  models  or  "neural  nets"  have  been 
studied  for  many  years  as  potential  methods  for  solving  speech 
and  image  recognition  problems.  These  models  are  composed  of 
many  nonlinear  computational  elements  operating  in  parallel  in 
networks.  Artificial  neural  net  structure  is  based  on  present 
understanding  of  biological  nervous  systems. 

The  neural  network  approach  has  many  advantages.  The  paral¬ 
lel  structure  of  such  networks  provides  very  high  computation 
rates.  Neural  nets  typically  provide  a  greater  degree  of  robust¬ 
ness  than  sequential  methods  because  they  have  many  more  process¬ 
ing  units.  Thus,  poor  or  missing  input  data  to  a  few  units  does 
not  significantly  impair  the  overall  performance  of  the  network. 
Another  advantage  is  that  the  connection  weights  can  be  adapted 
during  use  of  the  algorithm,  so  that  performance  is  continually 
improved  based  on  current  results.  In  the  interpretation  of 
image  features  such  as  clouds,  the  training  sample  can  never 
contain  examples  of  every  possible  feature  configuration.  Neural 
networks  can  continue  to  adapt  as  new  examples  are  encountered. 
Neural  net  classifiers  are  nonpararaetric  and  make  weaker  assump¬ 
tions  about  the  shapes  of  underlying  distributions  than  tradi¬ 
tional  statistical  classifiers.  It  is  for  this  reason  that 
neural  nets  work  well  when  distributions  are  non-Gaussian. 

The  automatic  interpretation  of  satellite  images  would  be  a 
powerful  aid  to  Navy  forecasters.  A  necessary  step  in  the  inter¬ 
pretation  process  is  to  identify  the  cloud  features  that  are 
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comprised  of  collections  of  contiguous  and  noncontiguous  cloudy 
pixels.  To  accomplish  this  goal  using  standard  pattern  recogni¬ 
tion  techniques  would  require  enormous  amounts  of  processing 
resources  and  would  likely  have  only  limited  success.  In  this 
paper,  neural  nets  are  presented  as  a  possible  alternative  way  to 
acquire  the  necessary  processing  capacity  using  networks  of 
simple  processing  elements  operating  in  parallel. 

In  the  next  section,  six  different  neural  network  configura¬ 
tions  that  accomplish  classification  are  presented.  It  will  be 
shown  that  the  multi-level  perceptron  configuration  is  most 
applicable  to  the  satellite  image  feature  recognition  problem. 

In  Section  3,  the  mathematical  form  of  the  multi-level  perceptron 
network  will  be  presented.  An  implementation  of  the  algorithm, 
called  "NOARL-NEURAL, "  has  been  developed.  The  program  has  been 
tested  on  simple  problems,  the  results  of  which  are  in  Section  4. 
Based  on  the  results  of  these  tests,  an  architecture  is  proposed 
for  a  comprehensive  cloud  image  analysis  system  that  uses  neural 
network,  expert  system  and  statistical  methodologies  in  concert. 
Finally,  the  conclusions  of  this  preliminary  research  will  be 
presented . 

2.  Types  of  Neural  Networks 

There  are  many  different  types  of  neural  net  models.  The 
various  models  are  designed  to  accomplish  different  goals.  For 
example,  some  are  designed  to  provide  a  parallel  content-address¬ 
able  memory  storage  capacity.  Similar  networks  can  be  used  to 
supply  missing  information  for  incomplete  input  sets.  Con- 
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straint-satisfaction  neural  networks  find  solutions  to  problems 
with  a  large  number  of  constraints.  These  networks  are  very 
useful  for  problems  in  which  there  is  no  solution  that  matches 
all  of  the  constraints.  In  such  cases,  the  solution  satisfies  as 
many  constraints  as  possible.  It  may  be  possible  to  use  such  a 
methodology  to  satisfy  the  constraints  of  multi-channel  atmos¬ 
pheric  profiling  from  satellite  radiance  data.  Similarly,  the 
solution  of  a  numerical  model's  cumulus  or  boundary-layer  parame- 
trization  could  be  solved  by  using  embedded  neural  networks  in 
the  model.  Pattern-associators  are  another  powerful  type  of 
neural  network,  in  which  learning  from  examples  is  accomplished. 
Similar  to  pattern  associators  are  the  so-called  auto-associators 
which  detect  regularity  in  a  set  of  input  patterns  and  can  re¬ 
store  distorted  or  incomplete  input  patterns  to  their  original 
form.  Such  networks  have  been  used  to  filter  noise  from  signals 
or  to  make  generalizations  about  examples  —  a  sort  of  automatic 
rule  generation.  Finally,  biological  neural  researchers  have 
used  neural  methods  to  simulate  human  cognitive  processes  in  an 
attempt  to  understand  human  perception. 

In  this  section,  six  neural  models  that  accomplish  pattern 
classification  will  be  examined  with  the  goal  of  choosing  the 
best  one  to  apply  to  the  satellite  image  problem.  The  different 
models  can  be  specified  by  the  network  topology,  node  character¬ 
istics,  and  training  or  learning  rules.  The  network  topology 
specifies  how  the  various  artificial  neurons,  or  "nodes,"  are 
distributed  and  interconnected.  The  node  characteristics  define 
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how  each  node  processes  the  signals  it  receives  from  other  nodes, 
and  what  signal  it  outputs.  The  learning  rules  define  how  the 
weights  are  adjusted  if  the  desired  output  is  not  attained. 

The  simplest  artificial  neuron  or  node  sums  N  weighted 
inputs  and  passes  the  result  through  a  nonlinear  function  as 
shown  in  Figure  1.  The  node  is  characterized  by  an  internal 
threshold  or  offset  9  and  by  the  type  of  nonlinearity.  Three 
such  nonlinearities,  shown  in  Figure  1,  are  the  hard-limiter, 
threshold  logic  and  sigmoid  functions.  Very  complex  neural 
models  may  include  a  time  dependency  and  a  more  complex  mathemat¬ 
ical  operation  than  summation.  In  an  image  classifier,  the 
inputs  might  be  the  gray  scale  level  of  each  pixel  and  the  output 
classes  might  represent  different  objects. 


Figure  1.  Artificial  neuron  which  forms  a  weighted  sum  of  N 
inputs  and  passes  the  result  through  a  nonlinear  function.  Three 
types  of  nonlinearities  are  shown. 
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In  an  excellent  survey  of  neural  networks,  Lippman  (1987)- 
summarized  six  neural  network  classifiers.  A  taxonomy  of  these 
methodologies  is  presented  in  Figure  2.  The  six  net  types  are 
first  divided  into  those  that  use  binary  versus  continuous  input 
data  values.  Next,  the  nets  are  divided  between  those  trained 
with  or  without  supervision.  Supervised  training  occurs  when  the 
correct  output  patterns  are  specified  for  each  training  case.  In 
unsupervised  training,  the  network  itself  groups  the  training  set 
outputs,  thus  defining  the  output  classes.  The  classical  statis¬ 
tical  algorithms  listed  at  the  bottom  of  Figure  2  are  those  most 
similar  to  the  corresponding  neural  net  in  methodology  or  func¬ 
tion.  The  six  methodologies  will  now  be  examined  with  the  goal 
of  choosing  the  one  most  applicable  to  the  satellite  image  inter¬ 
pretation  problem. 


NEURAL  NET  CLASSIFIERS  FOR  FIXED  PATTERNS 
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Figure  2.  A  taxonomy  of  six  neural  nets  used  for  classif ication 
problems.  Classical  algorithms  that  are  most  similar  to  the 
neural  net  model  are  shown  at  the  bottom  (From  Lippman,  1987). 
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2 • 1  Hopfield  Net 

The  Hopfield  net  is  one  of  three  nets  in  Figure  2  that  use 
binary  input.  Such  nets  are  appropriate  for  input  values  from 
images  where  the  pixel  data  are  determined  to  be  greater  than  or 
less  than  some  threshold  value.  The  Hopfield  net  can  be  used  as 
a  context-addressable  memory  or  to  solve  optimization  problems. 
This  net's  nodes  use  the  hard-limiter  nonlinearity  for  binary 
inputs  of  values  +1  and  -1.  The  output  of  each  node  is  fed  back 
to  all  other  nodes  by  a  symmetrical  weight  matrix.  By  providing 
a  set  of  examples  and  specified  output  classes,  the  weight  matrix 
can  be  iteratively  derived.  When  an  unknown  pattern  is  presented 
to  the  Hopfield  net,  it  iterates  in  discrete  time  steps  until  the 
outputs  no  longer  change  on  successive  iterations.  When  used  as 
a  memory  model,  the  resultant  output  is  the  desired  memory  infor¬ 
mation.  When  used  as  a  classifier,  the  pattern  specified  by  the 
outputs  is  compared  to  the  class  outputs  to  see  if  an  example 
pattern  is  matched.  If  none  is  matched,  a  "no  match"  result 
occurs . 

Hopfield  nets  can  be  used  to  provide  memory  retrieval  or 
classification  of  very  noisy  input  data.  However,  there  are  two 
major  limitations  to  these  nets.  First,  the  number  of  output 
pattern  classes  that  can  be  recalled  is  quite  limited  unless  the 
network  size  is  quite  large.  If  too  many  patterns  are  stored  for 
a  given  net  size,  the  net  may  converge  to  a  spurious  pattern 
different  from  all  of  the  example  patterns,  and  the  network  will 
not  be  able  to  classify  at  all.  This  problem  occurs  when  the 
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number  of  output  pattern  classes  is  greater  than  0.15  times  the 
number  of  nodes  in  the  net.  Thus,  a  Hopfield  net  to  distinguish 
10  classes  would  require  at  least  67  nodes.  Since  each  node  is 
connected  to  all  the  others,  the  weight  matrix  would  be  67x67,  or 
almost  4500  weights.  Thus,  the  training  time  and  storage  re¬ 
quirements  could  quickly  become  too  large  for  big  problems.  The 
second  problem  occurs  when  output  classes  are  too  similar.  In 
this  case,  the  net  may  converge  to  the  wrong  class  given  similar 
input  patterns. 

2 . 2  Hamming  Net 

The  Hamming  net  (Fig.  2)  is  similar  to  the  Hopfield  net 
except  that  nodes  use  the  threshold  logic  nonlinearity  (Fig.  1) 
and  the  network  consists  of  two  layers.  The  first  layer  operates 
much  like  a  Hopfield  net.  Binary  patterns  are  used  as  inputs, 
but  the  outputs  are  matching  scores  defined  by  the  so-called 
Hamming  distance  between  the  input  and  the  correct  class  for  each 
case.  The  Hamming  distance  is  the  number  of  bits  in  the  input 
which  do  not  match  the  corresponding  class  example  bits.  The 
outputs  of  the  first  layer,  after  convergence,  are  used  as  an 
input  layer  to  the  second  layer.  This  layer  iterates  until  only 
one  output  node  is  positive  and  the  rest  are  negative.  This  node 
corresponds  to  the  correct  class. 

The  Hamming  net  performs  as  well  or  better  than  the  Hopfield 
net  while  requiring  fewer  connections,  typically  only  as  many 
nodes  as  the  sum  of  the  number  of  inputs  plus  the  number  of 
classes.  Thus,  for  bigger  problems,  the  Hopfield  net  size  grows 
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exponentially  while  the  Hamming  net  size  grows  linearly.  Also, 
the  Hamming  net  does  not  have  the  spurious  output  problem  that 
the  Hopfield  net  has.  The  Hamming  net  is  a  direct  neural  net 
implementation  of  the  "optimum  classifier"  algorithm  used  in 
communications  theory  to  reduce  noise  in  binary  signals. 

2 . 3  Carpenter/Grossberg  Net 

The  Carpenter/Grossberg  net  is  an  implementation  of  their 
Adaptive  Resonance  Theory.  This  net  forms  clusters  and  is 
trained  without  supervision.  The  net  structure  is  similar  to  the 
Hamming  net  except  that  a  threshold  value  is  used  to  measure  how 
close  the  outputs  from  successive  cases  are  to  each  other.  The 
first  case  defines  a  locus  for  the  first  cluster.  The  next  case 
is  processed  through  the  net  and  is  clustered  with  the  first  case 
if  the  distance  between  them  is  less  than  the  threshold.  If  not, 
it  becomes  the  locus  for  a  new  cluster.  The  process  is  repeated 
for  all  of  the  test  cases.  Thus,  the  number  of  classes  grows 
with  time,  especially  when  the  threshold  distance  is  small.  Each 
new  class  requires  a  new  node,  and  as  many  new  weights  as  two 
times  the  number  of  inputs,  to  compute  matching  scores. 

The  Carpenter/Grossberg  net  can  work  well  with  perfect  input 
patterns.  However,  even  a  small  amount  of  noise  can  cause  prob¬ 
lems.  With  noise,  or  a  threshold  value  that  is  too  small,  all 
available  nodes  can  quickly  be  used  up  and  similar  patterns  will 
still  fail  to  be  clustered.  This  type  of  net  is  very  similar  to 
the  "sequential  leader  clustering"  statistical  algorithm. 
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2 . 4  Single-Laver  Perceptron  Net 

The  single-layer  perceptron  and  two  other  nets  in  Figure  2 
can  use  both  continuous-valued  and  binary  input  data,  which  is 
simply  a  subset  of  continuous-valued  data.  A  single  perceptron 
node  is  depicted  in  Figure  3.  This  artificial  neuron  decides 
whether  an  input  set  belongs  to  one  of  two  classes  denoted  A  or 
B.  The  weighted  sum  of  the  input  values  minus  some  optional 
threshold  is  passed  through  the  hard-limiter  nonlinearity,  yield¬ 
ing  an  output  of  + 1  or  -1.  The  decision  rule  is  to  choose  class 
A  if  the  output  is  +1  and  class  B  if  -1.  Lippman  (1987)  graphi¬ 
cally  depicts  the  behavior  of  such  a  perceptron  by  plotting  a  map 
of  the  multidimensional  space  defined  by  the  input  variables. 

The  first  two  components  of  that  space  are  depicted  in  Figure  4. 
The  weighted  sum  of  the  input  values  defines  a  pair  of  decision 
regions  that  specify  which  input  values  result  in  a  class  A  or  a 
class  B  response.  Thus,  the  perceptron  defines  a  hyperplane  that 
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Figure  3.  A  single-layer  perceptron  that  classifies  input  vec¬ 
tors  into  two  classes  denoted  A  and  B. 
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separates  the  two  decision  regions.  In  the  two-dimensional 
example  (Fig.  4)  the  hyperplane  is  a  line,  and  inputs  above  the 
line  lead  to  class  A  and  those  below  to  class  B.  Thus,  it  can  be 
seen  that  when  the  classes  are  not  overlapping  the  perceptron 
gives  a  perfect  classification  of  the  cases.  When  there  is  an 
overlap,  there  must  be  some  incorrect  classifications.  By  chang¬ 
ing  the  nonlinearity  to  the  threshold-logic  (Fig.  1) ,  and  itera¬ 
tively  correcting  the  weights  on  every  trial  by  an  amount  propor¬ 
tional  to  the  difference  between  the  desired  and  actual  outputs, 
the  perceptron  converges  to  the  least-mean-square  solution  in 
these  situations.  It  will  be  shown  in  the  next  section  that  the 
problem  of  overlapping  decision  regions  is  solved  by  adding  more 
layers  of  perceptrons.  The  difficulty  in  this  approach  is  the 
question  of  how  to  adjust  the  weights  of  previous  levels  in 
response  to  errors  occurring  in  higher  levels.  It  was  the  solu¬ 
tion  of  this  problem  that  renewed  interest  in  the  neural  network 
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Figure  4.  Graphical  representation  of  the  space  formed  by  two 
inputs  and  the  hyperplane  (line)  formed  by  a  neural  net  which 
classifies  the  cases  into  two  regions. 
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methodology  in  recent  years.  The  closest  analogy  to  perceptron 
nets  is  the  maximum-likelihood  Gaussian  classifier. 

2 . 5  Multi-layer  Perceptron  Net 

Multi-layer  perceptron  nets  have  one  or  more  layers  of  nodes 
between  the  input  and  output  layers.  These  so-called  "hidden 
layers"  solve  many  of  the  limitations  of  single-layer  percep- 
trons,  but  were  impossible  to  train  effectively  until  recently. 
The  training  problem  was  solved  when  Rumelhart  et  al .  (1986) 

devised  the  "back-propagation"  algorithm1.  In  back-propagation, 
the  error  between  the  actual  and  expected  output  is  used  to 
adjust  not  only  the  weights  contributing  directly  to  the  output, 
but  also  the  weights  contributing  to  all  of  the  contributing 
units  in  previous  layers. 

The  power  of  multi-level  perceptron  nets  derives  from  the 
nonlinearities  used  in  the  nodes.  If  the  nodes  were  linear 
processors  then  a  single-layer  network  could  do  the  same  calcula¬ 
tions  as  a  multi-layer  network.  The  comparative  capabilities  of 
single-layer,  two-layer  and  three-layer  perceptron  nets  using 
hard-limiter  nonlinearities  can  be  seen  in  Figure  5.  As  in  the 
previous  section,  the  decision  regions  for  various  problems  are 
described  geometrically.  The  second  column  (Fig.  5)  lists  the 
types  of  geometric  decision  regions  that  can  be  defined  by  the 


1The  author  was  fortunate  last  summer  to  have  attended  a  one-day 
tutorial  on  neural  networks  taught  by  Dave  Rumelhart  and  the 
leader  in  speech-recognition  neural  networks,  Terry  Sejnowski. 
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Figure  5.  Types  of  decision  regions  that  can  be  formed  by  sin¬ 
gle-  and  multi-layer  perceptron  nets  with  one  and  two  layers  of 
hidden  units  and  two  inputs.  Shading  denotes  decision  region  for 
class  A.  Smooth,  closed  contours  bound  input  distributions  for 
classes  A  and  B  (from  Lippman,  1987). 


net  structure  described  in  the  first  column.  The  third  column 
shows  a  graphical  depiction  of  the  exclusive-or  problem  and 
decision  regions  (shaded  vs.  nonshaded)  that  might  result  for 
each  net.  Notice  that  the  single-layer  perceptron  can  not  solve 
the  exclusive-or  problem  because  the  regions  cannot  be  separated 
by  a  hyperplane.  The  network  with  one  hidden  layer,  however,  can 
solve  the  exclusive-or  problem  because  the  extra  decision  layer 
allows  it  to  define  a  convex  region  in  the  space  defined  by  the 
inputs  (Fig.  5) .  It  turns  out  that  the  number  of  sides  on  the 
convex  region  is  determined  by  the  number  of  nodes  in  the  hidden 
layer.  By  adding  a  second  hidden  layer,  two  distinct  convex 
regions  can  be  defined  (Fig.  5). 
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The  fourth  column  in  Figure  5  depicts  network  decision 
regions  for  a  two-class  problem  in  which  a  portion  of  each  solu¬ 
tion  region  extends  into  the  concave  area  formed  by  the  other. 

For  these  "meshed”  decision  regions,  the  single-layer  net  again 
cannot  solve  the  problem.  Because  the  solution  space  requires  a 
concave  decision  region,  even  the  net  with  one  hidden  layer 
cannot  solve  the  problem  completely  (Fig.  3) .  Only  the  use  of  a 
second  hidden  layer  can  accomplish  such  a  concave  decision  region 
(Fig.  5).  The  last  column  in  Figure  5  depicts  the  most  general 
solution  region  shape  for  each  net  configuration.  Since  a  net 
with  two  hidden  layers  can  form  an  arbitrarily  complex  solution 
region,  there  is  never  a  need  to  structure  a  net  with  more  than 
two  hidden  layers.  Added  complexity  at  this  point  results  from 
using  more  nodes  in  the  hidden  layers  rather  than  more  layers. 

The  analysis  in  Figure  5  applies  only  to  perceptron  nets 
using  the  hard-limiter  and  having  a  single  output  node.  In 
actual  practice  multiple  output  nodes  and  the  sigmoidal  nonlin¬ 
earity  are  used.  The  behavior  of  these  nets  is  even  more  com¬ 
plex,  resulting  in  curved  decision  region  boundaries  rather  than 
the  straight-line  segments  depicted  here. 

Multi-layer  perceptron  nets  can  solve  a  wide  range  of  deter¬ 
ministic  problems  such  as  the  exculsive-or  problem,  and  have  been 
used  in  speech  synthesis  and  recognition  and  visual  pattern 
recognition.  The  back-propagation  method  has  some  convergence 
problens;  notable  a  tendency  to  converge  to  locally-minimum 
least-squares  solutions  rather  than  the  global  minimum.  As  will 
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be  shown  in  the  next  section,  there  are  mathematical  ways  to 
overcome  this  problem.  Another  problem  is  that  convergence  may 
be  slow,  requiring  many  passes  through  the  training  set.  This 
problem  is  especially  true  when  the  solution  regions  are  complex 
in  shape  and  disconnected.  Continuing  the  analogy  with  conven¬ 
tional  statistical  methods,  the  multi-layer  perceptron  yields 
results  similar  to  the  K-Nearest  Neighbor  classifier  (Lee  et  al . , 
1990)  . 

2 . 6  Kohonen  Self-Organizing  Feature  Maps 

This  methodology  is  an  unsupervised  learning  technique  that 
uses  continuous-valued  inputs.  Contrary  to  the  Carpenter/Gross- 
berg  net  (Sec.  2.3),  Kohonen's  technique  uses  a  predetermined 
number  of  output  classes.  There  is  one  output  node  for  each 
class,  and  the  output  nodes  are  extensively  interconnected  and 
organized  in  a  two-dimensional  array.  Input  cases  are  presented 
sequentially  without  specifying  a  desired  output.  A  neighborhood 
is  defined  around  each  output  node  resulting  in  weights  that 
cause  output  nodes  to  be  topologically  close  to  others  sensitive 
to  similar  inputs.  The  neighborhood  size  decreases  in  time, 
causing  a  uniform  classification  of  input  patterns  across  the 
range  of  output  nodes. 

The  Kohonen  net  has  been  used  in  speech  and  pattern  recogni¬ 
tion  problems.  It  is  less  sensitive  to  noise  because  the  number 
of  classes  is  predetermined.  A  drawback  is  that  the  net  tends  to 
be  sensitive  to  the  order  that  the  training  cases  are  processed, 
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especially  for  small  training  sets.  In  many  respects  it  is 
similar  to  the  conventional  K-Means  Clustering  algorithm. 

2 . 7  Network  Choice  for  Satellite  Image  Interpretation 
If  the  satellite  data  being  used  were  binary  (e.g., 
cloud/no-cloud)  and  the  classes  (i.e.f  features  to  be  recognized) 
were  well-defined  shapes,  the  Hamming  net  might  be  a  suitable 
approach.  However,  the  outputs  consist  of  fronts,  cloud  lines, 
vortices,  etc.,  all  of  which  can  occur  in  many  shapes  and  sizes. 
Thus,  the  solution  regions  for  this  problem  are  likely  complex  in 
shape  and  numerous.  In  addition,  there  might  be  some  important 
information  in  the  radiance  or  gray-scale  information  which  would 
be  masked  by  using  binary  data.  Because  of  these  considerations, 
it  seems  clear  that  the  multi-layer  perceptron  net  has  the  best 
chance  at  solving  the  satellite  image  interpretation  problem. 
Since  we  know  what  the  desired  output  classes  are,  the  Kohonen 
classification  methodology  is  probably  not  necessary. 

Since  the  multi-layer  perceptron  methodology  is  indicated, 
the  following  section  presents  a  mathematical  description  of  the 
network  function  and  training  procedure. 

3.  Multi-level  Perceptron  Neural  Networks 

Multi-layer  perceptron  nets  have  one  or  more  layers  of 
"hidden"  units  between  the  input  and  output  layers.  The  activa¬ 
tion  'a'  of  a  given  unit  is  a  function  of  the  weighted  sum  of  the 
input  units  to  which  it  is  connected.  Thus,  for  any  unit  i  the 
activation  due  to  a  training  pattern  p  is  given  by 

a i  “  f(netpi) 
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where 


netpi  =  j  wijapj  +  biasi 

and  f  is  the  sigmoid  function  (Fig.  1) .  The  bias  term  is  analo¬ 
gous  to  the  perceptron  offset  0,  and  can  be  thought  of  as  a 
scale  factor.  The  least-mean-square  (LMS)  learning  procedure  of 
Widrow  and  Hoff  (1960)  is  used  to  adjust  the  weights.  An  error 
function  is  defined  that  depends  on  the  summed  square  of  the 
error  for  a  set  of  training  cases.  The  error  of  an  output  node 
is  simply  the  difference  between  the  target  output  tp^  and  the 

network  output  ap^.  Thus,  the  error  E  is  given  by 

E  -  2?  (t  .  _a  .  )2 
pi  '  pi  api'  ' 

The  LMS  error  function  defines  an  error  space  or  surface  for  the 
various  potential  weight  values.  The  network  is  trained  by 
adjusting  the  weights  such  that  the  network  error  moves  down  the 
error  gradient  toward  a  minimum  value.  This  "gradient  descent" 
method  makes  the  change  in  weight  proportional  to  the  negative  of 
the  derivative  of  the  error  with  respect  to  each  weight.  Thus, 
the  learning  rule  (also  called  the  "delta  rule")  is 

*EP 

^  Wi  =  -k - 

3  (J-W 

where  k  is  the  constant  of  proportionality.  After  taking  the 
derivative  of  E,  the  delta  rule  becomes 

^  wi j  ~ £  £ piapj 

where  £=  2k  is  the  learning  rate  and 

‘tfpi  =  (bpi  ”  api)  f  '  i  (netpi) 

for  output  units.  Here,  netp^  is  defined  as  above  and  f'^(netp^) 
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is  the  derivative  of  the  activation  function  with  respect  to  a 
change  in  the  net  input  to  the  unit.  Since  we  require  this 
function  to  exist,  the  discontinuous  hard-limiter  and  threshold 
logic  nonlinearities  (Fig.  1)  cannot  be  used.  The  form  of  sig¬ 
moid  function  used  is  the  logistic  function 

1 

3pi  ”  1“  e-Tnit-,7 

The  derivative  of  this  function  can  be  shown  to  be 


da 


Pi 


=  a 


dnet 


pi ( 1-api) * 


Pi 


Thus,  the  error  ^p^  for  an  output  node  is  given  by 

'frpi  =  (^pi  "  api) api ( 1_api^ * 

For  hidden  units,  there  is  no  target  value  tp^.  The  back-propa¬ 
gation  technique  propagates  the  error  backward  through  the  net¬ 
work  beginning  with  the  output  nodes.  Since  the  error  of  an 
output  node  is  partially  due  to  the  inputs  it  receives  from  nodes 
in  previous  layers,  an  output  node  error  should  result  in  the 
adjustment  of  earlier  weights.  It  is  assumed  that  the  error 
propagation  should  be  proportional  to  the  contributing  weights; 
i.e.,  those  nodes  contributing  more  to  the  error  should  be  ad¬ 
justed  more.  Thus,  for  hidden  units  we  have 

^pi  ~  ^£pkwki^  '  i  (netpi) 

where  k  is  the  index  of  the  unit  to  which  unit  i  contributes. 

Backward  propagation  involves  two  steps.  First,  the  input 
pattern  is  fed  into  the  net  and  propagated  forward  to  compute 
output  values  apj  for  each  unit.  This  value  is  compared  to  the 


target  values,  resulting  in  a  €  terra  for  each  output  unit.  The 
second  phase  is  a  backward  pass  through  the  network  during  which 
the  'j'  term  is  computed  for  each  node.  After  these  two  phases, 
the  delta  term  for  each  weight  can  be  computed. 

It  is  important  to  note  that  the  activation  derivative, 
ap^(l-ap^),  is  zero  when  the  activation  is  a  maximum  (1.0)  or  a 
minimum  (0.0),  and  reaches  a  maximum  when  ap^=0.5.  Thus,  weights 
will  change  most  for  activations  not  yet  committed  to  being  on 
(1.0)  cr  off  (0.0)  in  the  network. 

Since  back-propagation  is  a  gradient-descent  procedure,  and 
since  the  error  surfaces  are  not  bowl-shaped,  there  may  be  a 
problem  of  settling  into  a  local  minimum.  This  problem  may  be 
alleviated  by  the  use  of  many  hidden  units. 

The  number  of  training  passes  required  before  the  net 
achieves  equilibrium  can  be  quite  large  if  the  learning  rate  is 
too  small.  Although  a  large  learning  rate  causes  the  weights  to 
change  faster,  it  can  lead  to  instabilities  that  cause  the 
weights  to  oscillate  rather  than  converge.  One  way  to  increase 
learning  rates  without  causing  oscillation  is  to  include  a  momen¬ 
tum  term  in  the  delta  equation: 

&  wij  (n+1)  =£(^piapj)  +  <*^wi j  (n) 
where  n  is  the  number  of  trials  using  a  learning  sample  and  X  is 
the  momentum  constant.  Thus,  the  derivative  from  the  most  recent 
learning  trial  is  used  to  stabilize  the  adjustment  of  weights  on 
the  current  trial. 
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A  program  to  derive  multi-layer  perceptron  neural  networks 
has  been  programmed  and  tested.  The  routine,  called  "NOARL- 
NEURAL, "  currently  runs  on  a  Z-248  or  on  the  HP-835.  A  copy  of 
the  program  and  a  User's  Guide  are  available  from  the  author. 

4.  Tests  of  NOARL-NEURAL 

In  this  section,  the  pattern-recognition  capability  of 
NOARL-NEURAL  is  demonstrated.  First,  a  neural  network  to  classi¬ 
fy  simple  geometric  patterns  is  derived.  For  a  slightly  more 
complex  problem,  the  network  is  next  trained  and  tested  with 
alphabetic  characters.  Finally,  a  crude  application  to  interpre¬ 
tation  of  satellite  imagery  is  presented.  As  will  be  seen,  the 
surprisingly  good  results  of  these  tests  indicate  a  strong  poten¬ 
tial  for  the  use  of  neural  nets  to  interpret  satellite  imagery. 

4 . 1  Tests  on  Geometric  Patterns 

In  this  test,  three  geometric  shapes  are  classified:  cir¬ 
cles,  triangles  and  squares.  The  data  for  these  cases  were 
generated  by  hand-sketching  shapes  on  a  5x5  grid  such  as  the  one 
in  Figure  6a.  For  each  grid  square,  the  amount  covered  by  a 
shape's  line  was  visually  estimated.  Thus,  for  the  example  in 
Figure  6b,  square  1  has  zero  coverage,  square  2  has  0.3  (30%) 
coverage,  etc.  The  training  set  included  six  sketches  for  each 
of  the  three  shapes,  for  a  total  of  18  cases  (Fig.  7) .  Notice 
that  the  circles  are  not  always  centered  on  the  grid,  and  the 
squares  and  triangles  are  presented  in  different  orientations. 

The  25  grid-coverage  values  define  an  input  set  for  a  neural 
network.  This  network  consists  of  two  layers;  25  inputs  feeding 
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Figure  6.  a)  Grid  of  5x5  squares  used  as  a  digitizing  background 
for  sketches  of  geometric  shapes,  and  b)  example  of  a  circle 
drawn  on  the  grid  and  the  estimated  coverage  in  each  square. 

to  a  hidden  layer  of  three  units  and  an  output  layer  also  of 
three  units.  Each  output  corresponds  to  one  of  the  shapes. 

Thus,  a  circle  is  assigned  the  output  0-0-1,  a  square  has  the 
output  0-1-0,  and  a  triangle  is  given  1-0-0.  The  network  was 
trained  on  the  18  learning  cases.  The  net  converged  very  quick¬ 
ly,  requiring  only  100  passes  of  the  18  cases  before  the  total 
sum  of  the  squared  error  reached  very  low  levels.  The  18  depend¬ 
ent  sample  cases  were  then  processed  through  the  network.  As  can 
be  seen  in  Table  1,  all  18  were  classified  correctly. 

A  test  sample  of  12  cases  was  next  defined  (Fig.  8} .  There 
was  no  conscious  effort  to  make  these  cases  similar  to  the  test 
cases;  in  fact,  the  squares  are  particularly  dissimilar  to  those 
in  the  dependent  sample  (Fig.  8  vs.  Fig.  7) ,  The  network  classi- 
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Table  l.  Activations  of  a  neural  net  with  three  output  nodes, 
and  corresponding  target  activations,  for  18  dependent  sample 
geometric  shapes.  Outputs  corresponding  to  triangles,  sguares 
and  circles  are  1-0-0,  0-1-0  and  0-0-1,  respectively. 


Output 

Node  Activations 

Target 

Activations 

Case 

1 

2 

3 

1 

2 

3 

1 

T1 

0.99 

0.03 

0.00 

1.00 

0.00 

0.00 

2 

SI 

0.00 

0 . 97 

0.01 

0.00 

1.00 

0.00 

3 

Cl 

0.00 

0.02 

0.99 

0.00 

0.00 

1 . 00 

4 

T2 

0.99 

0.02 

0.00 

1 . 00 

0.00 

0.00 

5 

S2 

0.01 

0.98 

0.00 

0.00 

1 . 00 

0.00 

6 

C2 

0.00 

0.05 

0.98 

0.00 

0.00 

1.00 

7 

T3 

0.99 

0.03 

0.00 

1.00 

0.00 

0 . 00 

8 

S3 

0.01 

0.98 

0.01 

0.00 

1.00 

0.00 

9 

C  3 

0.00 

0.03 

0.99 

0. 00 

0.0^ 

1. 00 

10 

T4 

0.99 

0.04 

0.00 

1 . 00 

0.00 

0 . 00 

11 

S  4 

0.01 

0.98 

0.00 

0.00 

1 . 00 

0 . 00 

12 

C4 

0.00 

0.02 

0.99 

0.00 

0.00 

1 . 00 

13 

T5 

0.99 

0.02 

0.00 

1.00 

0 . 00 

0.00 

14 

S  5 

0.01 

0.97 

0.01 

0.00 

1.00 

0.00 

15 

C5 

0.00 

0.01 

0.98 

0.00 

0 . 00 

1 . 00 

16 

T6 

0.99 

0.02 

0.00 

1.00 

0.00 

0.00 

17 

S6 

0.01 

0.94 

0.01 

0 . 00 

1.00 

0.00 

18 

C6 

0.00 

0.02 

0.99 

0.00 

0.00 

1.00 

fication  of  the  independent  cases  (Table  2)  was  very  good  —  only 
one  incorrect  classification.  Case  ST2  was  correct,  but  the 
0.00-0.63-0.31  output  values  indicate  that  the  net  considered 
this  square  to  be  somewhat  similar  to  a  circle.  The  case  ST3 
outputs  were  0.01-0.16-0.15.  Thus,  the  net  didn't  find  a  lot  of 
similarity  with  any  of  the  test  shapes,  but  still  just  managed  to 
favor  the  correct  classification.  The  incorrect  classification 
of  circle  CT4  as  a  square  was  clearly  wrong,  as  evidenced  by  the 
0.02-0.96-0.00  output. 

This  test  is  admittedly  crude  in  that  inexact,  hand-generat¬ 
ed,  visually-estimated  input  data  are  used.  The  small  sample 
sizes  also  preclude  any  generalization  based  on  these  results. 
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Figure  7.  Dependent  sample  of  18  circles,  squares  and  triangles 
used  to  train  the  neural  net.  (Labels  are  case  numbers.) 


Table  2.  As  in  Table  1  except  for  12  independent  sample  cases. 


Output 

Node  Activations 

Target 

.  Activations 

Case 

1 

2 

3 

1 

2 

3 

1 

TT1 

0.99 

0.02 

0.00 

0.10 

0.00 

0.00 

2 

ST1 

0.02 

0.96 

0.00 

0.00 

1.00 

0.00 

3 

CT1 

0.00 

0.02 

0.99 

0.00 

0.00 

1.00 

4 

TT2 

0.99 

0.02 

0.00 

1.00 

0.00 

0.00 

5 

ST2 

0.00 

0.63 

0.31 

0.00 

1.00 

0.00 

6 

CT2 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

7 

TT3 

0.99 

0.02 

0.00 

1.00 

0.00 

0.00 

8 

ST3 

0.01 

0.16 

0.15 

0.00 

1.00 

0.00 

9 

CT3 

0.02 

0.96 

0.00 

0 . 00 

0.00 

1.00 

10 

TT4 

0.^9 

0.03 

0.00 

1.00 

0.00 

0.00 

11 

ST4 

0.01 

0.97 

0.00 

0.00 

1.00 

0.00 

12 

CT4 

0.00 

0.14 

0.92 

0.00 

0.00 

1 . 00 

However,  < 

Dne  goal 

of  this 

test  was 

to  verify 

the 

function 

NOARL-NEURAL  program.  The  good  independent  sample  results  indi¬ 
cate  that  the  program  is  indeed  deriving  a  skillful  neural  net¬ 
work.  Despite  the  low  number  of  hidden  layer  units,  the  net 
still  shows  the  ability  to  classify  most  of  the  cases  correctly. 

4 . 2  Tests  on  Alphabetic  Characters 

Although  they  are  presented  in  different  orientations,  all 
of  the  geometric  shapes  in  each  of  the  previous  example  classes 
are  basically  the  same.  To  present  a  more  difficult  task,  the 
net  was  given  four  alphabetic  characters  to  recognize:  'A',  'a', 

' B '  and  'b'.  What  makes  this  test  more  difficult  is  that  there 
are  only  two  output  classes;  that  is,  the  net  must  determine 
whether  the  input  pattern  indicates  the  first  or  the  second 
letter  of  the  alphabet,  regardless  of  upper  or  lower  case.  Thus, 
the  network  must  accomplish  the  same  outputs  for  the  very  dissim¬ 
ilar  input  patterns  'A'  and  'a’,  and  for  'B'  and  'b'. 
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As  in  the  previous  test,  sketches  of  the  letters  were  made 
on  the  5x5  grid  (Fig.  9) .  There  are  6  upper-  and  6  lower-case 
occurrences  for  each  letter,  for  a  total  of  24  examples  in  the 
training  set.  Notice  that  the  letters  are  of  different  sizes  and 
shapes,  and  many  occupy  different  locations  in  the  grid.  The 
letters  'A'  and  'a'  were  assigned  the  outputs  1-0,  while  'B'  and 
'b'  were  assigned  0-1. 

The  network  for  this  test  consisted  of  25  inputs,  4  hidden 
units  and  2  output  units.  Again,  the  net  converged  very  quickly 
and  training  was  stopped  at  100  iterations.  When  the  24  depend¬ 
ent  sample  cases  were  processed  through  the  net,  all  were  cor¬ 
rectly  classified  (Table  3). 

An  independent  sample  of  12  cases  (Fig.  10)  was  used  to  test 
the  network.  All  but  one  were  correctly  classified  (Table  4) . 

In  the  incorrectly  classified  case  (AT4),  a  lower-case  'a'  was 
classified  as  a  'b',  probably  due  to  a  somewhat  large  lower  loop 
size  compared  to  the  lower  loops  of  the  training  set  examples. 

Again,  this  test  is  for  a  small  sample  size  and  the  net 
configuration  is  small  and  only  two-layered.  Even  so,  the  net 
has  the  ability  to  discern  almost  all  of  the  different  letters 
regardless  of  letter  case.  These  encouraging  results  set  the 
stage  for  a  test  of  neural  nets  on  actual  satellite  imagery  which 
will  be  presented  in  the  next  section. 

4 . 3  Tests  on  Satellite  Imagery 

In  this  test,  the  more  ambitious  goal  of  distinguishing 
between  major  cloud  patterns  on  a  satellite  image  will  be  at- 
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Figure  10.  Independent  sample  of  12  alphabetic  letters. 
(Labels  are  case  numbers.) 


Table  3.  Activations  of  a  neural  net  with  two  output  nodes,  and 
corresponding  target  activations,  for  24  dependent  sample  alpha¬ 
betic  letters.  Outputs  corresponding  to  'A'  or  'a'  are  1-0; 
outputs  corresponding  to  'B'  or  1  b'  are  0-1. 


Output 

Node 

Target 

Activations 

Activati 

ons 

Case 

1 

2 

1 

2 

1 

Al 

0.99 

0.00 

1.00 

0.00 

2 

B1 

0.00 

0.99 

0.00 

1.00 

3 

A2 

0.99 

0.00 

1.00 

0.00 

4 

B2 

0.00 

0 .99 

0.00 

1.00 

5 

A3 

0.99 

0.01 

1.00 

0.00 

6 

B3 

0.00 

0.99 

0.00 

1.00 

7 

A4 

0.99 

0.01 

1 . 00 

0.00 

8 

B4 

0.02 

0.99 

0 . 00 

1.00 

9 

A5 

0.99 

0 . 00 

1 . 00 

0.00 

10 

B5 

0 . 00 

0.99 

0.00 

1.00 

11 

A6 

0.99 

0.00 

1 . 00 

0 . 00 

12 

B6 

0.00 

0 .99 

0.00 

1.00 

13 

A7 

0.99 

0.00 

1.00 

0.00 

14 

B7 

0.00 

0 . 99 

0.00 

1.00 

15 

A8 

0.99 

0.01 

1.00 

0.00 

16 

B8 

0.00 

0.99 

0.00 

1.00 

17 

A9 

0.99 

0.00 

1.00 

0.00 

18 

B9 

0.00 

0.99 

0.00 

1.00 

19 

A10 

0.99 

0.00 

1 . 00 

0.00 

20 

BIO 

0.00 

0.99 

0.00 

1.00 

21 

All 

0.99 

0.00 

1.00 

0.00 

22 

Bll 

0 . 00 

0.99 

0 . 00 

1.00 

23 

A12 

0 . 99 

0 . 00 

1 . 00 

0 . 00 

24 

B12 

0.00 

0 . 99 

0 . 00 

1.00 

tempted.  The  images  used  will  be  a  sequence  of  GOES  infrared 
(IR)  pictures  presented  in  the  Navy  Tactical  Applications  Guide 
(NTAG)  Vol.  4  (Fett  et  al . .  1984).  In  Section  IB  of  that  NTAG , 
an  eastern  North  Pacific  blocking  situation  is  depicted.  In 
choosing  the  types  of  cloud  patterns  to  distinguish,  it  was 
considered  desirable  to  include  patterns  of  approximately  the 
same  size  and  that  occur  somewhat  in  isolation;  i.e.,  the  pattern 
is  distinct  and  is  not  contiguous  with  an  adjacent  pattern.  Upon 
examination,  three  major  cloud  patterns  that  meet  these  criteria 
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Table  4.  As  in  Table  3  except  for  12  independent  sample  cases. 


Output 

Node 

Target 

Activations 

Activations 

Case 

1 

2 

1 

2 

1 

ATI 

0.99 

0.00 

1.00 

0.00 

2 

AT  2 

0.98 

0.02 

1.00 

0.00 

3 

BT1 

0.00 

0.99 

0.00 

1.00 

4 

BT2 

0.00 

0 . 99 

0.00 

1.00 

5 

AT  3 

0.99 

0 . 00 

1.00 

0.00 

6 

AT  4 

0.00 

0.99 

1.00 

0.00 

7 

BT3 

0.00 

0.99 

0.00 

1.00 

8 

BT4 

0.00 

0.99 

0.00 

1.00 

9 

AT  5 

0.99 

0.00 

1.00 

0.00 

10 

AT  6 

0.99 

0.00 

1.00 

0.00 

11 

BT5 

0.48 

0.55 

0.00 

1.00 

12 

BT6 

0.00 

0 . 99 

0.00 

1 . 00 

were  determined  —  frontal  bands,  cirrus  clouds  and  vortices. 

The  last  category  includes  only  those  vortical  clouds  not  associ¬ 
ated  with  an  accompanying  frontal  band;  e.g.,  those  associated 
with  troughs  or  cut-off  lows.  In  choosing  cases,  the  description 
from  the  NTAG  text  was  used  to  determine  the  correct  feature 
classification. 

As  in  the  previous  tests,  a  5x5  input  grid  was  used.  In 
this  case,  however,  a  template  conforming  to  the  GOES  image  was 
defined  (Fig.  11)  such  that  each  grid  square  was  5°x5°.  The 
template  was  converted  to  acetate  such  that  it  could  be  overlaid 
on  the  GOES  image  (Fig.  12).  The  grid  square  that  falls  at  the 
center  of  each  cloud  feature  defines  the  center  of  a  5x5  square 
region.  For  each  cloud  pattern,  the  amount  of  upper  (e.g., 
bright  on  IR)  cloud  cover  in  each  square  was  visually  estimated. 
For  this  network  a  26th  input  parameter,  the  northernmost  lati¬ 
tude  of  the  grid  centered  on  each  feature,  was  added.  A  total  of 
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Figure  11.  Template  which,  after  transfer  to  acetate,  forms 
grid  of  5°x5°  squares  used  to  digitize  GOES  infrared  cloud 


Figure  12.  GOFS  infrared  image  from  Fett  et  al .  (1984),  which 

includes  several  cloud  features  used  in  the  neural  cloud  pattern 
recognizer  (see  Table  5). 


19  images  was  used,  resulting  in  18  examples  of  each  cloud  pat¬ 
tern  (Fig.  13).  The  images  will  not  be  reproduced  here;  a  table 
of  the  54  cases  (Table  5)  can  be  used  in  conjunction  with  the 
NTAG  to  determine  which  cloud  patterns  were  used.  In  this  study, 
every  third  case  of  each  type  was  separated  to  form  an  independ¬ 
ent  test  sample. 

Again,  a  two-layer  network  was  used,  this  time  consisting  of 
26  inputs  feeding  to  three  hidden  and  three  output  units.  Cirrus 
was  assigned  the  output  1-0-0,  fronts  were  assigned  0-1-0  and 
vortices  were  given  0-0-1.  When  trained  on  the  dependent  sample 
cases,  the  net  was  comparatively  slow  to  converge.  Several  tests 
of  500-1000  iterations  were  tried  with  only  limited  success.  At 
the  time  of  these  tests,  only  the  Z-248  version  was  available. 

It  was  finally  decided  to  initiate  the  program  at  the  end  of  a 
work  day  and  let  it  iterate  overnight.  By  morning,  it  had  com¬ 
pleted  30,000  iterations.  When  this  network  was  tested  on  the 
dependent  sample,  all  but  one  of  the  36  cases  were  correctly 
classified  (Table  6).  The  one  error  was  in  classifying  a  front 
(case  29,  F-14 )  as  a  vortex,  although  the  front  in  case  9  (F-4) 
was  just  barely  classified  correctly. 

On  the  independent  sample,  only  limited  success  was  expected 
due  to  the  small  dependent  sample  size,  the  coarse  resolution  of 
the  input  grid,  the  inexact  method  of  determining  the  cloud  cover 
amount,  the  use  of  only  one  radiance  channel,  and  the  goal  of 
distinguishing  somewhat  similar  patterns  that  occur  in  a  wide 
variety  of  shapes.  However,  on  these  18  test  cases,  the  net  was 
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Figure  13a.  18  digitized  cirrus  GOES  cases.  Label  beneath  each 

grid  is  the  NTAG  reference  (see  Table  5) .  Cases  marked  "-I"  form 
the  independent  test  sample. 
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Figure  13b.  As  in  Fig.  13a  except  for  18  digitized  GOES  fronts. 


Figure  13c.  As  in  Fig.  13a  except  for  18  digitized  GOES  cloud 
vortices . 


Table  5.  54  cirrus,  front  and  vortex  cloud  patterns  taken  from 

NTAG  Vol.  4  (Fett,  et  al . .  1984)  for  training  and  testing  a 
neural  net.  Labels  used  in  NTAG  are  included  for  reference,  as 
well  as  latitude  and  longitude  of  grid  center  used  in  digitizing 
the  cloud  amounts. 


NTAG 

Figure 

Cirrus 

Front 

Vortex 

lB3a 

C"  (12.5N, 162.5W) 

F" 

(47 . 5N, 127 . 5W) 

A" 

(27 . 5N, 147 . 5W) 

D"  (17 . 5N, 132 . 5W) 

H" 

(47. 5N ,172. 5W) 

lB7a 

jet (22 . 5N , 127 . 5W) 

I" 

( 4  2 . 5N , 172 . 5W) 

A" 

(27 . 5N, 147 . 5W) 

jet (12 . 5N , 142 . 5W) 

F" 

( 4  2 . 5N , 97 . 5W) 

lBlla 

jet(22.5N,132.5W) 
jet (12 . 5N, 157 . 5W) 

I" 

(42 . 5N, 157 . 5W) 

A" 

(32 . 5N, 142 . 5W) 

lB15a 

PJS (22 . 5N , 142 . 5W) 

J" 

(42 . 5N, 172 . 5W) 

K" 

(42 . 5N, 137 . 5W ) 

lB19a 

J" 

1(42. 5N, 157. 5W) 

Q" 

(42 . 5N, 172 . 5W) 

P" 

(27 . 5N , 127. 5W) 

lB23a 

j  et ( 17 . 5N , 117. 5W) 

Q" 

(47 . 5N, 152 . 5W) 

P" 

(27 . 5N , 117 . 5W) 

lB27a 

jet (17 . 5N , 152 . 5W) 

S" 

(27 . 5N , 142 . 5W) 

P" 

(27 . 5N, 117 . 5W) 

1B3  Oa 

jet (22 . 5N, 142 . 5W) 

S" 

(37 . 5N , 147 . 5W) 

1B3  la 

jet (22 . 5N , 132 . 5W) 

S" 

(37 . 5N, 147 . 5W) 

U" 

(42 . 5N , 172 . 5W) 

1B31C 

jet (22 . 5N , 132 . 5W) 

U" 

(42 . 5N , 167 . 5W) 

lB32a 

jet (27 . 5N , 127 . 5W) 

U" 

(37 . 5N , 157 . 5W) 

1B3  3a 

jet(12 . 5N, 127. 5W) 

U" 

(37 . 5N, 152 . 5W) 

1B3  5a 

jet (22 . 5N , 122 . 5W) 

lB38a 

jet (27 . 5N, 117 . 5W) 

V" 

(37. 5N, 172 . 5W) 

U" 

3(37. 5N, 152 . 5W) 

lB39a 

jet (27 . 5N, 112 . 5W) 

V" 

( 37 . 5N, 167 . 5W) 

U" 

3 (32 . 5N, 142 . 5W) 

1B39C 

V" 

(42 . 5N, 167 . 5W) 

X" 

(32 . 5N, 132 . 5W) 

1B4  Oa 

X" 

( 32 . 5N, 127 . 5W) 

V" 

(47 . 5N, 172 . 5W) 

V" 

1 ( 37 . 5N, 152 . 5W) 

1B4  la 

X" 

( 32 . 5N, 122 . 5W) 

V" 

(47 . 5N, 172 . 5W) 

V" 

1 ( 37 . 5N, 147 . 5W) 

1  B4  3a 

Y"3 (27 . 5N , 152 . 5W) 

V" 

(47 . 5N , 172 . 5W) 

able  to  correctly  classify  every  case  (Table 

7) 

.  The  likely 

reason  for  this  surprising  success  is  that  the  cases  are  selected 
from  a  time  sequence  of  images,  so  that  there  is  some  similarity 
between  instances  of  a  pattern  on  images  separated  by  a  few 
hours.  Nevertheless,  these  results  are  a  very  encouraging  indi- 
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Table  6.  Activations  of  a  neural  net  with  three  output  nodes, 
and  corresponding  target  activations,  for  36  dependent  sample 
cloud  patterns.  Outputs  corresponding  to  cirrus,  fronts  and 


vortices 

are  1-0-0, 

0-1-0 

and  0-0- 

1,  respectively. 

Output  Node 

t  Activations 

Target 

Activations 

Case 

1 

2 

3 

1 

2 

3 

1 

Cl 

0.99 

0.06 

0.00 

1.00 

0 . 00 

0 . 00 

2 

FI 

0 . 00 

0 . 99 

0.00 

0.00 

1.00 

0.00 

3 

VI 

0 . 00 

0.03 

0 . 99 

0 . 00 

0.00 

1 . 00 

4 

C2 

0.99 

0.06 

0.00 

1.00 

0.00 

0.00 

5 

F2 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

6 

V2 

0.00 

0 . 03 

0.99 

0 . 00 

0.00 

1.00 

7 

C4 

0.99 

0 . 06 

0 . 00 

1 . 00 

0.00 

0.00 

8 

F  4 

0.03 

0 . 05 

0.00 

0 . 00 

1 . 00 

0 . 00 

9 

Y4 

0.00 

0 . 03 

0.99 

0.00 

0 . 00 

1 . 00 

10 

C5 

0 . 99 

0.06 

0 . 00 

1.00 

0.00 

0.00 

11 

F5 

0 . 00 

0.99 

0 . 00 

0.00 

1.00 

0.00 

12 

V5 

0 . 00 

0 . 03 

0 . 99 

0 . 00 

0.00 

1 . 00 

13 

C7 

0.99 

0. 06 

0.00 

1 . 00 

0. 00 

0. 00 

14 

F7 

0 . 00 

0 . 99 

0 . 00 

0 . 00 

1.00 

0 . 00 

15 

V7 

0 . 00 

0.03 

0.99 

0 . 00 

0.00 

1 . 00 

16 

C3 

0 . 99 

0.06 

0 . 00 

1 . 00 

0.00 

0 . 00 

17 

F3 

0.00 

0.99 

0.00 

0 . 00 

1.00 

0 . 00 

18 

V8 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

19 

CIO 

0 . 99 

0.06 

0.00 

1.00 

0.00 

0.00 

20 

F10 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

21 

V10 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

22 

Cll 

0.99 

0.06 

0.00 

1.00 

0.00 

0.00 

23 

Fll 

0 . 00 

0.99 

0.00 

0 . 00 

1 . 00 

0 . 00 

24 

VI 1 

0.00 

0 . 03 

0.99 

0 . 00 

0.00 

1 . 00 

25 

C13 

0.99 

0 . 06 

0 . 00 

1.00 

0 . 00 

0. 00 

26 

F13 

0 . 00 

0.99 

0.00 

0.00 

1 . 00 

0 . 00 

27 

V13 

0. 00 

0.03 

0.99 

0.00 

0.00 

1 . 00 

28 

C 1 4 

0 .99 

0 . 06 

0 . 00 

1. 00 

0. 00 

0 . 00 

29 

F14 

0 . 00 

0.03 

0.99 

0 . 00 

1 . 00 

0 . 00 

30 

V14 

0 . 00 

0 .03 

0 . 99 

0.00 

0.00 

1 . 00 

31 

C16 

0.99 

0 . 06 

0.00 

1 . 00 

0 . 00 

0.00 

32 

F16 

0 . 00 

0 .99 

0.31 

0.00 

1 . 00 

0.00 

13 

V16 

0.00 

0.03 

0.99 

0 . 00 

0.00 

1.00 

34 

C17 

0.92 

0.05 

0 . 00 

1.00 

0 . 00 

0.00 

35 

F17 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

36 

V17 

0.00 

0 . 03 

0 . 99 

0 . 00 

0.00 

1.00 

cation  that  the  neural  pattern  recognition  approach  can  be  ap¬ 
plied  to  the  cloud  pattern  identification  problem. 
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Table  7.  As  in  Table  6  except  for  18  independent  sample  cases. 


Output 

Node  Activations 

Target 

Activations 

Case 

1 

2 

3 

1 

2 

3 

1 

C3-I 

0.99 

0.07 

0.00 

1 . 00 

0.00 

0.00 

2 

F3-I 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

3 

V3-I 

0.00 

0.03 

0.99 

0.00 

0.00 

1 . 00 

4 

C6-I 

0.99 

0.06 

0.00 

1.00 

0.00 

0 . 00 

5 

F6-I 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

6 

V6-I 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

7 

C9-I 

0.99 

0.06 

0.00 

1.00 

0.00 

0.00 

8 

F9-I 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

9 

V9-I 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

10 

C12-I 

0.99 

0.06 

0.00 

1 . 00 

0.00 

0.00 

11 

F12-I 

0.00 

0.99 

0.00 

0.00 

1.00 

0.00 

12 

V12-I 

0.00 

0.03 

0.99 

0.00 

0.00 

1.00 

13 

C15-I 

0.99 

0.06 

0.00 

1.00 

0.00 

0 . 00 

14 

F15-I 

0.00 

0.99 

0.00 

0.00 

1 . 00 

0.00 

15 

V15-I 

0.00 

0.03 

0.99 

0 . 00 

0.00 

1 . 00 

16 

C18-I 

0.96 

0.06 

0.00 

1 . 00 

0.00 

0.00 

17 

F18-I 

0 . 00 

0.99 

0.00 

0 . 00 

1 . 00 

0.00 

18 

V 1 8  - 1 

0.00 

0.03 

0 .99 

0 . 00 

0 . 00 

1.00 

5.  Proposed  Cloud  Image  Feature  Recognition  System 

These  results  are  a  very  encouraging  indication  that  neural 
cloud  pattern  recognition  is  indeed  possible.  This  study,  howev¬ 
er,  was  designed  only  to  prove  the  concept,  not  to  provide  an 
operational  pattern  recognition  capacity.  The  types  of  patterns 
distinguished  here  are  quite  restrictive  compared  to  the  range  of 
patterns  seen  on  the  day-to-day  imagery.  There  are  cloud  fea¬ 
tures  of  both  large  and  small  scales,  and  instances  where  fea¬ 
tures  may  be  contiguous  or  overlap.  These  problems  must  be 
solved  before  neural  cloud  image  interpretation  is  possible. 

An  additional  problem  that  would  be  crucial  to  automated, 
real-time  image  interpretation  is  the  huge  amount  of  pixel  data 
in  each  image.  For  each  pixel  value  to  be  processed,  even  if 
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only  to  be  accessed  as  an  input  to  a  neural  net,  would  likely 
require  an  unacceptably  large  amount  of  computer  resources  if 
done  sequentially,  or  expensive  hardware  if  done  in  parallel. 

Based  on  the  results  of  this  study,  a  potential  processing 
architecture  for  automated  cloud  feature  pattern  recognition  will 
now  be  presented.  The  envisioned  system  is  called  the  Cloud 
Image  Feature  Recognition  System,  or  CIFRS  (pronounced 
"cy-fers" ) . 

As  shown  in  this  study,  large-scale  patterns  can  be  recog¬ 
nized  based  on  fairly  coarse  input  data  from  the  image.  Thus,  it 
may  not  be  necessary  to  process  data  from  every  pixel,  but  rather 
from  every  Nth  pixel.  The  acceptable  value  of  N  could  probably 
be  determined  statistically  from  information  about  how  much  of 
such  a  scene  is  cloudy  and  the  total  number  of  pixels  it  con¬ 
tains.  Given  such  a  spot-sampling  of  the  image,  cloudy  vs.  non- 
cloudy  regions  could  be  determined.  Because  of  the  robustness  of 
the  neural  interpretation  methods,  this  initial  cloud/no-cloud 
determination  could  be  from  a  simple  radiance  threshold  method. 
Alternately,  a  neural  net  using  inputs  from  many  channels  might 
prove  superior  for  this  task.  Adjacent  cloudy  pixels  would  then 
be  considered  to  define  a  cont inuously-cloudy  region.  Again,  the 
validity  of  this  assumption  depends  on  the  number  of  pixels 
omitted  from  the  sample.  Even  if  multiple  features  are  errone¬ 
ously  combined  at  this  step,  a  later  process  will  separate  them. 

Once  the  cloudy  regions  have  been  determined,  a  large-scale 
pattern-recognition  neural  net  could  be  used  to  provide  an  ini- 
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tial  classification.  Such  a  net  would  likely  be  quite  similar  to 
the  one  presented  in  the  previous  section.  The  net  would  have  to 
be  trained  on  a  data  set  that  includes  any  large-scale  cloud 
features,  including  those  apparently  large-scale  features  that 
are  in  reality  contiguous  smaller-scale  features.  Consider, 
however,  that  the  image  area  that  must  be  analyzed  has  been 
reduced  by  the  omission  of  predominantly  noncloudy  regions.  In 
addition,  it  seems  likely  that  the  analysis  of  a  large,  cloudy 
region  would  still  not  require  the  processing  of  every  pixel  in 
the  region.  Thus,  the  CIFRS  approach  would  establish  a  signifi¬ 
cant  reduction  in  the  amount  of  data  to  be  processed. 

Given  a  large-scale  identification  of  cloudy  regions,  a 
logic-based  system  would  be  used  to  expand  the  analysis  to  in¬ 
clude  smaller-scale  features.  An  example  of  such  a  system  is  the 
SIAMES  expert  system  (Peak,  1989) .  SIAMES  uses  information  about 
the  known  feature  classifications  to  initiate  a  search  for  asso¬ 
ciated  smaller-scale  features  that  might  be  present.  For  exam¬ 
ple,  once  SIAMES  knows  there  is  a  front,  it  looks  for  open  cells 
behind  the  front.  This  top-down  approach  accomplishes  a  further 
reduction  of  the  data  because  specific  locations  are  chosen  for 
the  search.  In  CIFRS,  a  neural  net  would  be  trained  for  analysis 
of  each  such  subregion.  For  example,  a  post-frontal  neural  net 
would  look  for  open  or  closed  cells,  small-scale  vortices,  etc. 

An  additional  task  at  this  stage  would  be  the  analysis  of  details 
of  the  large-scale  features.  Examples  might  be  the  sharpness  of 
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a  frontal  band  edge  or  the  presence  of  transverse  bands  in 
cirrus . 

The  envisioned  architecture  is  presented  in  Figure  14.  The 
pixel  data  is  first  reduced  by  some  statistical  method  in  the 
Data  Reducer.  This  reduced  data  set  forms  the  input  to  the  Cloud 
Determiner.  This  module  might  attempt  a  cloud/no-cloud  categori¬ 
zation,  or  possibly  a  cloud  classification  such  as  does  the 
neural  network  methodology  of  Lee  et  al .  (1990).  The  output  of 

this  analysis  step  becomes  the  input  to  the  Large  Feature  Identi¬ 
fier.  This  module  would  be  very  similar  to  that  presented  in 
this  study.  However,  the  system  should  be  designed  to  have  its 
results  checked  by  the  next  module,  the  Analysis  Supervisor. 

Since  neural  nets  never  have  to  stop  learning,  if  subsequent 
analysis  proves  that  a  large-feature  classif ication  was  wrong, 
that  network  could  learn  from  its  mistake. 

The  Analysis  Supervisor  would  be  responsible  for  overseeing 
the  more  detailed  analysis.  Like  SIAMES,  it  would  use  contextual 
information  and  preliminary  conclusions  to  generate  hypotheses 
about  the  remaining  image  features  and  also  generate  more  de¬ 
tailed  analysis  of  the  identified  features.  It  is  unclear  wheth¬ 
er  the  Supervisor  would  be  a  knowledge-based  system  or  a  neural 
net.  To  accomplish  the  verification  of  these  hypotheses,  a  suite 
of  Small  Feature  Identifiers  would  be  available.  The  Supervisor 
would  determine  which  to  use  and  at  what  locations.  Based  on  the 
growing  knowledge  provided  by  the  continuing  analysis,  the  Super¬ 
visor  might  generate  further  uses  for  the  Small  Feature  Identifi- 
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Figure  14.  Proposed  CIFRS  architecture. 
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ers,  or  might  alter  its  earlier  conclusions.  In  particular, 
large-scale  features  that  are  actually  connected  smaller  features 
would  be  identified  as  such. 

Eventually,  the  goal  is  to  provide  a  meteorological  inter¬ 
pretation  of  the  image  features.  At  this  early  stage,  it  is 
somewhat  difficult  to  determine  how  much  of  the  CIFRS  processing 
will  proceed  top-down  or  bottom-up.  It  seems  likely  that  data 
will  be  processed  in  both  directions  repetitively;  as  more  fea¬ 
tures  are  identified,  the  increased  information  will  generate  a 
search  for  more  detail  about  what  is  known  and  for  new  features 
not  yet  recognized.  The  CIFRS  process  appears  to  be  similar  to 
the  way  humans  interpret  an  image.  certainly  do  not  begin  by 

analyzing  every  tiny  detail  —  rather,  a  few  large  features  catch 
our  attention  first.  Based  on  what  these  features  might  be,  we 
generate  in  out  minds  hypotheses  about  the  overall  image  and 
begin  to  search  for  evidence  to  verify  or  disprove  those  hypothe¬ 
ses.  Thus,  we  only  examine  in  detail  the  portions  of  the  image 
that  give  us  the  information  that  we  need.  Granted,  sometimes 
scanning  an  area  that  we  have  no  particular  interest  in  will 
uncover  an  interesting  feature.  After  the  CIFRS  knowledge-based 
interpretation  is  complete,  the  system  could  then  include  such 
scanning  of  areas  not  yet  processed  using  neural  networks  (Fig. 
14).  The  time  and  detail  devoted  to  this  search  would  depend  on 
the  computer  resources  available,  but  since  much  of  the  image 
will  already  be  interpreted  the  scanning  should  not  have  to  cover 
a  large  area. 
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The  CIFRS  system  would  involve  many  neural  nets  tailored 
toward  different  tasks.  The  research  involved  in  constructing 
such  a  complete  system  is  extensive.  However,  the  results  of  the 
initial  tests  presented  here  provide  a  good  starting  point  in 
developing  CIFRS.  The  first  task  should  probably  be  to  develop  a 
more  sophisticated  neural  identifier  of  large  features.  If 
possible,  actual  satellite  image  data  should  be  used.  Thus,  the 
problem  of  determining  how  much  pixel  data  to  ignore  should  also 
be  addressed.  Once  the  large-scale  features  are  being  recog¬ 
nized,  the  system  can  be  expanded  to  include  more  and  more  de¬ 
tailed  analysis. 

6.  Conclusions  and  Recommendations 

Neural  networks  are  a  powerful  type  of  computer  model  in 
which  many  nonlinear  processing  elements  are  arranged  in  parallel 
networks.  These  networks,  based  on  current  understanding  of 
biological  nervous  systems,  have  proven  useful  in  pattern  recog¬ 
nition  problems. 

In  this  paper,  six  neural  networks  that  perform  classifica¬ 
tion  have  been  investigated.  The  multi-level  perceptron  network 
has  been  chosen  to  be  most  applicable  to  the  classification  of 
satellite  cloud  images.  The  mathematical  basis  for  this  neural 
network  model  has  been  presented  and  a  version  of  the  model, 
called  "NOARL-NEURAL, "  has  been  programmed. 

In  simple  tests  on  geometric  and  alphabetic  shapes,  the 
multi-layer  perceptron  neural  nets  were  shown  to  have  high  levels 
of  pattern  classification  skill.  A  more  complex  test  using  GOES 
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infrared  imagery  showed  that  a  neural  network  can  be  used  to 
distinguish  between  fronts,  cirrus  and  cloud  vortices.  Of  the  54 
cloud  patterns  in  this  test,  only  one  dependent  sample  pattern 
was  incorrectly  classified. 

The  surprisingly  good  results  of  this  limited  experiment 
have  led  to  a  proposed  architecture  for  automated  cloud  feature 
recognition.  The  system,  called  CIFRS  (Cloud  Image  Feature 
Recognition  System) ,  would  involve  several  neural  networks  tai¬ 
lored  to  the  interpretation  of  different  types  of  cloud  features. 
These  network  types  would  be  based  on  feature  size  and  location 
relative  to  other  known  features.  The  complete  system  would 
likely  include  statistical  and  knowledge-based  components  to 
accomplish  data  reduction  and  to  direct  the  search  process.  The 
proposed  architecture  includes  both  top-down  and  bottom-up  proc¬ 
essing  such  that  the  problem  is  reduced  to  a  manageable  size. 

The  continuous  feedback  between  the  CIFRS  modules  should  make  the 
system  less  sensitive  to  classification  errors  made  during  any 
one  step  in  the  process. 

The  next  step  in  the  process  should  be  to  refine  the  large- 
scale  feature  identification  experiment  presented  in  this  study. 
It  should  be  attempted  to  derive  a  network  that  classifies  a 
wider  range  of  large-scale  features.  Possible  inputs  modifica¬ 
tions  would  include  different  resolutions,  multiple-channel 
information  and  cloud  classifications.  Since  CIFRS  requires  a 
skillful  large-feature  classification,  the  development  of  this 
module  is  a  natural  first  step. 
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